Genetic structure of Spirometra mansoni (Cestoda: Diphyllobothriidae) populations in China revealed by a Target SSR-seq method

Background In China, the plerocercoid of the cestode Spirometra mansoni is the main causative agent of human and animal sparganosis. However, the population genetic structure of this parasite remains unclear. In this study, we genotyped S. mansoni isolates with the aim to improve current knowledge on the evolution and population diversity of this cestode. Methods We first screened 34 perfect simple sequence repeats (SSRs) using all available omic data and then constructed target sequencing technology (Target SSR-seq) based on the Illumina NovaSeq platform. Next, a series of STRUCTURE. clustering, principal component, analysis of molecular variance and TreeMix analyses were performed on 362 worm samples isolated from 12 different hosts in 16 geographical populations of China to identify the genetic structure. Results A total of 170 alleles were detected. The whole population could be organized and was found to be derived from the admixture of two ancestral clusters. TreeMix analysis hinted that possible gene flow occurred from Guizhou (GZ) to Sichuan (SC), SC to Jaingxi (JX), SC to Hubei (HB), GZ to Yunnan (YN) and GZ to Jiangsu (JS). Both neighbor-joining clustering and principal coordinate analysis showed that isolates from intermediate hosts tend to cluster together, while parasites from definitive hosts revealed greater genetic differences. Generally, a S. mansoni population was observed to harbor high genetic diversity, moderate genetic differentiation and a little genetic exchange among geographical populations. Conclusions A Target SSR-seq genotyping method was successfully developed, and an in-depth view of genetic diversity and genetic relationship will have important implications for the prevention and control of sparganosis. Graphical Abstract Supplementary Information The online version contains supplementary material available at 10.1186/s13071-022-05568-1.


Background
Parasitic tapeworms of humans and animals cause diseases of major socio-economic importance around the world [1]. Several of these remain neglected in terms of research and control, such as Spirometra mansoni (Cestoda: Diphyllobothriidae) [2]. The plerocercoid larvae of S. mansoni can lead to a serious parasitic zoonosis in humans known as sparganosis by invading the subcutaneous tissues, abdominal cavity, eye and central nervous system, causing local tissue damage, blindness, paralysis, and even death [3]. Human infection results mainly from the ingestion of uncooked frogs or snakes infected with plerocercoids, drinking water contaminated with copepods that have been infected with procercoids or direct contact of frog or snake flesh with an open wound [4]. Most cases of sparganosis have reported in Eastern and Southeastern Asia [5]. On a country basis, China has the largest number of sparganosis cases in the world, with > 1300 reported cases of human sparganosis in 26 out of 34 provinces, autonomous regions and municipal districts [6]. In recent years, the number of local cases have increased in several districts of China, and sparganosis has even been termed as an emerging zoonosis [4,7,8].
Knowledge of the extent of intraspecific genetic diversity of S. mansoni is not only useful for understanding the dynamics of individual infections but also valuable for the prevention and control of sparganosis. Several pioneer studies have investigated the genetic diversity of Spirometra tapeworms. Jeon and Eom [9] performed a genetic variability analysis of Spirometra species from Korea, China and Japan using two mitochondrial gene sequences (cytochrome c oxidase subunit 1 [cox1] and cytochrome b [cytb]). The study identified 148 (cox1) and 83 (cytb) haplotypes within 239 and 213 isolates, indicating that mitochondrial haplotypes of Spirometra erinaceieuropaei and Spirometra decipiens were found in the three Asian countries [9]. Based on cox1 sequences, Yamasaki et al. [10] suggested two distinct Spirometra species, Type I and Type II, are present in Asia, neither of which is close to the likely European "S. erinaceieuropaei." Kołodziej-Sobocińska et al. [11] found distinct genetic separation of the Polish and Chinese populations of Spirometra isolates based on cox1 and cytb, and suggested that isolates from Polish and Chinese populations should be two species. Using a global full-length DNA sequence dataset of cox1, Kuchta et al. [3] performed a comprehensive phylogenetic analysis of Spirometra tapeworms and suggested that there are at least six distinct lineages of the genus. In China, the genetic diversity of isolates of Spirometra tapeworms from different geographical locations and different hosts (frogs and snakes) have been investigated in recent years [12][13][14][15][16]. Although previous studies have provided valuable information that contribute to a better understanding of the genetic diversity of Spirometra tapeworms, there is room for improvement: (i) previous studies have relied on relatively small sample sizes with few representative isolates; (ii) the isolates used were collected mainly from intermediate hosts (frogs or snakes), with few samples from final hosts included; (iii) previous estimates of genetic diversity were based on one or only a few traditional mitochondrial markers (mostly cox1 and cytb); and (iv) no high-throughput sequencing method was applied. Therefore, our knowledge of the population diversity of S. mansoni obtained from different geographical areas of China is still fragmented.
In the study reported here, we explored the genetic diversity of S. mansoni using a high-throughput simple sequence repeats (SSRs) genotyping method. SSRs, also known as microsatellites, are highly variable repetitive elements with nucleotide motifs of 1-6 bp and are ubiquitous in most eukaryotic organisms [17]. SSRs have been verified as suitable markers for inferring genetic variance and population differences in cestode species [14,[18][19][20][21][22]. However, traditional analysis based on gel electrophoresis cannot distinguish base differences or changes correctly in SSR amplicons, often causing false positive or false negative results in SSR detection [23]. Recently, an increasing number of studies have validated that genome-wide SSRs, especially perfect SSRs, exhibit stable motifs and conserved corresponding flanking sequences [17,[24][25][26]. Therefore, the identification of perfect SSRs is critical in microsatellite analysis, such that amplification of the appropriate PCR products can be ensured in genetic research applications. In this study, we used a genotyping method named Target SSR-seq, which combines high-throughput sequencing with genome-wide perfect SSRs that exhibit stable motifs and flanking sequences. Target SSR-seq enables the genotyping of targeted SSR loci simultaneously in a large number of samples with high coverage, using a single Illumina lane [27]. Moreover, by adding sequencing adapters and dual barcode tags, the SSR genotypes were determined directly from the deep sequencing of PCR products [28].
In order to conduct a comprehensive genetic diversity analysis of S. mansoni isolates, we collected a total of 368 isolates from 76 different geographical locations in China, which encompassed 16 out of 26 endemic regions of human sparganosis in the country. Specifically, the following objectives were addressed: (i) development of a high-throughput SSR genotyping (Target SSR-seq) method sensitive enough to genotype S. mansoni; and (ii) exploration of the genetic structure of S. mansoni populations with broad representative samples. Xu et al. Parasites & Vectors (2022) 15:485

Ethical statement
This study was performed strictly based on the recommendations of the Guide for the Care and Use of Laboratory Animals of the National Health Commission of China. The protocol was approved by the Life Science Ethics Committee of Zhengzhou University (Permission No. 2020-0704).

Sample collection
Human sparganosis is endemic in 26 provinces/autonomous regions/municipalities in China [5]. From July 2013 to September 2020, we collected S. mansoni isolates in 16 of these regions. For the 10 endemic regions where no samples were collected, we surveyed wild frogs for S. mansoni infections in Heilongjiang, Jilin, Liaoning, Beijing, Hebei, Shandong and Qinghai (excluding Taiwan, Hong Kong and Macao), but found no parasite-positive frog. In total, 368 samples isolated from 12 different hosts in 76 geographical locations were included in this study (Fig. 1). Among the 368 samples, 318 were plerocercoids collected from frogs (Pelophylax nigromaculatus, Sylvirana latouchii, Fejervarya limnocharis, Odorrana margaretae and Boulengerana guentheri), 42 were plerocercoids collected from snakes (Zaocys dhumnades, Elaphe carinata and Elaphe taeniura) and eight were adults collected from felids (Panthera tigris tigris, P. tigris altaica, Prionailurus bengalensis and Felis catus). The precise locality, origin and date of collection of each sample are presented in Additional file 1: Table S1. The methods used to collect plerocercoids in frogs and snakes have been described previously [14,15]. In brief, frogs or snakes were euthanized using ethyl-ether anesthesia and then weighed and skinned. The presence of plerocercoids in muscles and subcutaneous tissues was carefully observed visually. Once identified, the worms were isolated using small scissors and forceps and placed in phosphate-buffered saline to count and observe their shapes and movements. The plerocercoid was identified as S. mansoni by molecular genotyping with bidirectional sequencing of the mitochondrial cox1 gene (Additional file 1: Table S2) [3,29]. The definitive hosts (cats and tigers) for collecting adult worms came from a wildlife zoo park in Changsha City, Hunan Province. All animals died of natural causes, and necropsies were carried out under approved protocols by the wildlife zoo park technical office. The intestinal tracts of the analyzed carnivores were carefully removed from each carcass and subsequently isolated by ligatures (pylorus and rectum). Examination of the intestinal content was performed as described by Arrabal et al. [30]. All collected samples were washed extensively in physiological saline, snap-frozen in liquid nitrogen, and then stored at -80 ℃ for further study.

Development of perfect SSR markers
A method for SSR genotyping using a target sequencing technology called Target SSR-seq was used to explore the genetic structure of S. mansoni populations in China (Fig. 2). The first step was to analyze the transcriptome data (https:// www. ncbi. nlm. nih. gov/ biopr oject/ PRJNA 761840) [31] to uncover genome-wide SSRs using the MISA-web tool [32]. The search was carried out using the following parameters: (i) > 9 repeat units for mononucleotide microsatellites; (ii) ≥ 4 repeat units for di-, tri-nucleotide microsatellites; (iii) ≥ 3 repeat units for tetra-, penta-and hexa-nucleotide microsatellites; and (iv) repeat length up to 100 bp. The poly-A and poly-T sequences at the terminal regions of the unique transcripts (UTs) were removed before searching. Second, the publicly available expressed sequence tags (ESTs) were searched and retrieved from the NCBI EST database. MISA (MIcroSAtellite Identification Tool) was also used to search for microsatellite loci in the EST sequences. The following parameters were used to identify perfect SSRs (i.e. those that exhibit stable motifs and have conserved corresponding flanking sequences): (i) SSR motif length < 50 bp; (ii) no INDELs, poly regions or SSR loci in the 150-bp flanking sequence; and (iii) even distribution in chromosomes [27]. Primers were designed for SSR loci by Primer3Plus with the following parameters: (i) primer lengths ranging from 18 to 27 bp; (ii) product lengths ranging from 100 to 300 bp; (iii) melting temperature (Tm) ranging from 57 °C to 63 °C; and (iv) GC-content ranging from 40 to 60%, with an optimal value of 50%. Before library construction, we needed to optimize these SSR markers twice. First, each SSR was independently tested and optimized by a single PCR amplification procedure consisting of: 1 cycle of 95 °C for 2 min; 11 cycles of 94 °C for 20 s, 63-58 °C (− 0.5 °C per cycle) for 40 s, 72 °C for 1 min; then 24 cycles of 94 °C for 20 s, 65 °C for 30 s, 72 °C for 1 min; with a final cycle at 72 °C for 2 min. SSR markers which amplified clear bands were selected for further multiplex PCR optimization. According to the multiplex PCR results, the composition and concentration of SSR markers in the panel of multiple PCR were adjusted and optimized to ensure that each SSR marker in the multiplex system could be amplified efficiently and specifically.

Library construction and high-throughput sequencing
Two rounds of PCR were performed to construct the library. In the first round, the optimized panels were used for multiple PCR amplification using the genomic DNA of each sample as a template. After quality control, the amplified products were mixed, and the quantity of amplified products of each SSR marker was ensured to be equal. In the second round, primers with index sequence were used to add 8 bp of specific label sequences compatible with the Illumina NovaSeq 6000 platform (Illumina Inc., San Diego, CA, USA) to the end of the library. After that, the amplified products of all samples were mixed in equal quantities, and the final FastTarget ™ (Genesky Biotechnologies Inc., Shanghai, China) sequencing library was recovered by gumming. The length distribution of fragments in the library was verified using an Agilent 2100 BioAnalyzer (Agilent Technologies, Inc., Santa Clara, CA, USA). After the molar concentration of the library was accurately quantified, the Illumina NovaSeq 6000 platform was finally used for high-throughput sequencing in a 2 × 150-bp double-ended sequencing mode, and each SSR region was sequenced for at least 1000× coverage. In addition, positive and negative controls of PCR amplification were set to assay the repeatability of the Target SSR-seq method. Specifically, in the Target SSR-seq experiment, 3% of the total samples were selected using a doubleblind method to set up as positive control, and pure water was used as a negative control. To ensure quality, raw reads must be filtered to get clean reads with FastQC (https:// www. bioin forma tics. babra ham. ac. uk/ proje cts/ fastqc/), and subsequent analyses were based on clean reads. The total number of reads was calculated, and the samples were classified. A Perl script "SSRSeq count" str_count.pl (https:// github. com/ ccoo22/ SSRseq_ count) was used to process the clean reads to generate an SSR

Genetic diversity analysis
Genetic information statistics, including observed number of alleles (Na), the effective number of alleles (Ne), Shannon's information index (I), observed heterozygosity (Ho), and expected heterozygosity (He), were calculated by using the software POPGENE32 [33]. Polymorphic information content (PIC) was calculated by using a Perl script [34]. The formulas are as follows: where N is the number of samples, N Het is the number of homozygous samples, N Hom is the number of heterozygous samples, l is the allele locus and P i and P j are the population frequencies of the ith and jth alleles, respectively.

Genetic structure analysis
The STRU CTU RE v2.3 software program was used to infer the population structure [35]. The population number (K) was evaluated from 1 to 10. The optimal K value was determined by comparing the LnP (D) and ΔK based on the rate of change in LnP (D) [36]. Polysat was used to calculate the genetic distance between pairs of samples and principal coordinate analysis (PCoA) [37]. Nei's genetic distance was calculated for pairs of subpopulations based on allele frequency. The genetic similarity matrix and genetic distance matrix between subpopulations were obtained. Neighbor-joining (NJ) and 'ggtree' in the phangorn package in the R packet (R Foundation for statistical computing, Vienna, Austria) were respectively applied to construct and draw the evolutionary tree [38]. The genetic differentiation coefficient between subpopulations was calculated using the software GenAlEx using allele frequency [39], and population differentiation was measured together with an analysis of molecular variance (AMOVA). The software Treemix was used to infer multiple population splitting and mixing events based on genome-wide allele frequency data [40]. The results were based on the provided materials to draw the maximum likelihood tree of the population and to calibrate migration events in the phylogenetic tree.

Exploration of core SSR sets for species identification
To screen a minimal number of SSRs for distinguishing the maximal number of S. mansoni isolates, we used a script in Perl to calculate the discrimination power for SSRs with the following formula: discrimination power = the number of isolates showing unique genotypes/total isolates. High discrimination power refers to high saturation value and high SSR discernibility. A core set of SSRs with the best discernibility ability was obtained, and the saturation curve was plotted by discrimination power for SSRs. Core Hunter software was used to construct a core collection based on MR (modified Rogers Distance) [41]. MR treats each allele as a separate dimension, which is an improvement on the standard Euclidean distance. The sampling ratios of the software were set as 0.01, 0.02, 0.03, 0.04, 0.05, 0.06, 0.07, 0.08, 0.09, 0.1, 0.2 and 0.3, respectively. were tri-nucleotide repeats and 101 (2.36%) were tetranucleotide repeats (Additional file 1: Table S3). The distribution of SSR motifs revealed that the CT repeat was the most widespread di-nucleotide SSR motif (329/1024), and that the TCT repeat was the most widespread trimer among all tri-nucleotide SSR motifs (217/2760). Based on these identified SSRs, we obtained 98 SSRs after SSR filtering and perfect SSR search. Finally, 39 evenly distributed perfect SSRs were selected to test in Target SSR-seq after primer detection and optimization of the library enrichment system. Detailed information on SSR loci and primers is shown in Table 1.

Genotyping analysis of SSR-seq
During the high-throughput sequencing of 368 S. mansoni isolates, six samples failed due to the low quality of the specimens. The average sequencing depth per SSR capture in 309 samples (85.4%) was > 1000× (Fig. 3a). The clean ratio was > 0.8 for almost all of the samples (97%) (Fig. 3b), indicating the high quality and accuracy of the sequencing results. Moreover, four of the 39 SSRs exhibited high miss rates (> 20%), probably due to unstable flanking sequences. One SSR was observed as monomorphism in 362 varieties. Ultimately, a total of 34 polymorphic SSRs were obtained for the genotyping of 362 S. mansoni varieties.

Genetic diversity of S. mansoni populations
Target SSR-seq captured 170 alleles of 34 target SSR loci in 362 samples. Two motif types, trinucleotide and tetranucleotide, accounted for 88.2% and 11.8% of the 170 alleles, respectively. The number of SSR motif repeats ranged from three to seven, and 55 alleles (32.4%) contained four repeat units. The number of alleles per SSR locus varied from two to 18, with an average of five; however, the average effective number of alleles (Ne) was 2.4763, indicating a non-uniform distribution of alleles in the population ( For the genetic diversity analysis, we first explored the genetic differences among the 16 geographical populations. The NJ tree based on Nei's unbiased genetic distance demonstrated that the 16 geographical populations were clustered into three genetic groups (Fig. 4a). The first group mainly included populations from central (Hubei and Henan provinces) and eastern (Anhui, Jiangsu, Jiangxi, and Shanghai) China, with the exception of the Chongqing and Guizhou populations. The second group contained three southwestern populations (Guangxi, Yunnan and Sichuan), three southern populations (Guangdong, Hainan and Hunan) and one population from eastern China (Zhejiang). The third group only consisted of the Fujian (FJ) population from southeastern China. PCoA showed a similar clustering pattern to that of phylogenetic inference (Fig. 4b). Within the PCoA plot, the first principal coordinate accounted for 52.77% of the total variation, while the second principal coordinate accounted for 26.57%. Populations from geographically close regions showed similar distributions in the PCoA plot. However, the FJ population is located far away from the other populations, suggesting larger genetic variation within the FJ population. Next, genetic differences of tapeworms isolated from different hosts were explored (Fig. 4c). Clustering analysis showed that the samples from different hosts were clustered into three genetic groups. The first group included isolates from four frogs (P. nigromaculatus, F. limnocharis, O. margaretae and B. guentheri) and two cats (F. catus and P. bengalensis). The second group contained isolates from one frog (S. latouchii), two snakes (Z. dhumnades and E. carinata) and one tiger (P.t. tigris). The third group included isolates from one snake (E. taeniura) and one tiger (P.t. altaica). The PCoA analysis showed 54.21% and 30.5% explained variance of PCoA1 and PCoA2, respectively (Fig. 4d). The locations of the definitive hosts (cats and tigers) were scattered and far from each other, while the intermediate hosts (frogs and snakes) were concentrated in the middle of the coordinate axis and close to each other, indicating greater genetic variety among the definitive hosts and less genetic variety among intermediate hosts. Moreover, in order to investigate the detailed genetic diversity of each isolate, we performed a phylogenetic analysis using all 362 samples. The unrooted NJ tree revealed three main clusters (Fig. 5). Cluster 1 was the main clade and included 303 samples; cluster 2 and cluster 3 included 33 and 26 samples, respectively.

Genetic structure of S. mansoni in China
The STRU CTU RE software program and Evanno's correction results indicated that 362 isolates were divided  T_43  GAG  3  77  91  15  5  264  F: TAC TGG TCT CTC GGG CAA GA   R: GGT CCG TAG TCG TCG TCA TC   T_44  TCC  3  118  138  21  7 Fig. 1 for abbreviations of geographical locations). Clearly, the STRU CTU RE analysis did not divide the 16 populations according to their geographical origin. In contrast, each population showed a mixed membership to the inferred clusters. Isolates from AH (n = 17), CQ (17), HB (17), JS (17) and SH (3) revealed one prominent genotype, whereas the remaining isolates indicated the admixture of two genotypes (Fig. 6c). Moreover, the PCoA analysis of 362 samples showed a similar clustering pattern to the STRU CTU RE analysis (Fig. 7). All isolates are generally clustered into two main groups (Pop1 and Pop2) with partial overlap among several samples.

Population differentiation of S. mansoni in China
The AMOVA analysis of 34 SSR genotypes in 362 samples indicated that the maximum variation of 94.40621% resulted from differences within populations, while the remaining variation came from "among groups" and "among populations within groups, " accounting for 2.13143% and 3.46236% of the total variance, respectively (  Fig. S1). To infer migration events between populations, a maximum likelihood tree was built using Tree-Mix (Fig. 8). Consistently, the tree topology indicated that all of the geographical populations clustered into three main branches. A large number of migration events were identified among these S. mansoni populations. One Fig. 5 Unrooted NJ tree of 362 Spirometra mansoni isolates. Red, blue and pink branches indicate cluster 1, cluster 2 and cluster 3, respectively migration that was detected began from the GZ population to the SC population with a strong migratory signal (migration weight: 0.6). In addition, four migrations with weak signals were detected, including one event that began from SC to JX (migration weight: 0.25), one that began from SC to HB (migration weight: 0.3), one that began from GZ to YN (migration weight: 0.5) and one that began from GZ to JS (migration weight: 0.45).

Core SSR set identification and core samples analysis
At the end of the study, we intended to explore a core SSR set that will be more suitable and precise for use in the identification of Spirometra isolates through further screening of all identified 34 polymorphic SSRs. As a result, a set of 23 core SSRs that could distinguish 91.1% of 362 S. mansoni varieties was identified ( Fig. 9; Additional file 1: Table S4). Consistent with the above finding, structure analysis based on 23 core SSRs classified the 362 isolates into two populations (Additional file 1: Fig. S2). The PCoA analysis also distinguished the two populations, with PCoA1 explaining 15.37% and PCoA2 explaining 13.39%, respectively (Additional file 1: Fig. S3). The AMOVA analysis showed that the maximum variation mainly came from within populations. (Additional file 1: Table S5). The pairwise F st between Pop1 and Pop2 was 0.052. Therefore, the set of 23 core SSRs was sufficient in representing the genetic diversity and identifying Chinese S. mansoni varieties. In addition, considering the huge geographical areas of China, it is hard to sample enough isolates to represent the whole genetic background of Chinese S. mansoni populations. Therefore, we want to screen core sample sets in each geographical population based on our collected isolates so that these core samples can represent core or backbone varieties of each geographical population with minimum number of samples. Finally, a total of 76 strains were selected as the core samples of Chinese S. mansoni under the sampling ratio of 0.2 and MR of 0.43 (Additional file 1: Table S6).

Discussion
Simple sequence repeats exist extensively throughout eukaryotic genomes and are widely used for estimating genetic diversity and spatial structure of populations [1]. Traditional microsatellite analysis based on gel electrophoresis usually cannot distinguish base  Fig. 1). a Plot of the delta K values (population number) generated by the STRU CTU RE software program. b STRU CTU RE plots of 362 individuals grouped by the Q-matrix (estimated membership coefficient for each sample) at K = 2. Each isolate is represented by a vertical line, partitioned into the colored segments representing the tapeworm estimated membership fractions in K. The same color indicates that the isolates belong to the same group. Different colors for the same isolate indicate the percentage of the genotype shared with each group. c STRU CTU RE plots of individuals according to 16 geographical populations differences or changes correctly in SSR amplicons [23]. In addition, the experimental process of the traditional SSR genotyping method is extremely complicated and time-consuming. Therefore, the research community needs the development of novel methods for SSR discovery and genotyping that require less effort while delivering high accuracy and efficiency. To our knowlegde, our study is the first to use Target SSR-seq in genetic research on medical tapeworm populations. In detail, most of the selected Spirometra samples (85.4%) had a > 1000× average sequencing depth, indicating high sequencing depth in this population genetics study. Only six of the 368 samples failed to be genotyped due to the quality of the samples, suggesting high genotyping efficiency. Additionally, the number of PCR cycles in this study was 11, while traditional genotyping methods usually reach up as high as 40 cycles; taking into account that the higher the number of PCR cycles, the higher the possibility of incurring DNA polymerase slippage, the low number of PCR cycles in our study indicate that the PCR quality was highly reliable. Moreover, the Target SSR-seq method processed our multitarget SSR loci (34 SSRs) in a large number of samples  (368 samples) in a short time, confirming that this method has the advantages of higher accuracy regarding genotyping results and shorter time consumption (Table 5). Therefore, the Target SSR-seq method succeeds in providing high-throughput SSR genotyping with high accuracy and efficiency. It not only achieved accurate genotyping of S. mansoni populations in the present study, but it also can be widely extended to genetic diversity analyses of other parasites. We explored genetic differences among S. mansoni isolates from different geographical populations as well as from different host populations. Clustering analysis supported three genetic groups in the 16  When considering the detailed genetic diversity of each isolate, although three main clusters were revealed in the phylogenetic tree, no obvious geographical clusters were found, indicating that mixed memberships exist in these geographical populations. In fact, as for most parasitic helminths, the geographical distance is rarely the only determinant of the degree of genetic similarity or difference between populations, and the migration of hosts must also be considered [42]. Both the NJ clustering and PCoA analysis showed that isolates from intermediate hosts (snakes and frogs) tended to cluster together, probably because certain snakes and frogs have similar habitats with overlapping scopes of activities, thus resulting in less geographical isolation and genetic difference of their parasitizing tapeworms [43]. In contrast, parasites isolated from definitive hosts (cats and tigers) have fewer opportunities for gene exchange, thus leading to greater genetic difference. The influence of hosts on parasite genetic polymorphism is likewise limited by other conditions, such as habitat, activity range and whether the migration time of definitive hosts coincides with the period of infection [44]. Therefore, still more numerous factors need to be considered to infer the actual causes of gene exchange and genetic polymorphism of S. mansoni. STRU CTU RE analysis divided all S. mansoni isolates into two main groups (Pop1 and Pop2), which is consistent with previous studies using multi-gene markers (cytb, cox1, rrnS, and 28S rDNA D1) and traditional SSRs [13,14]. However, the STRU CTU RE analysis did not divide the 16 populations according to their geographical origin; in contrast, each population showed a mixed membership to the inferred clusters. With the exception of isolates from five geographical populations (AH, CQ, HB, JS and SH) that revealed one prominent genotype, the remaining isolates showed two main genotypes. In summary: (i) there are two genotypes in the Chinese S. mansoni population, and both genotypes have sympatric distribution in most of geographical populations, with genotype 1 being a prevalent genotype; and (ii) there are two sub-populations: Group 1 and Group 2 are verified, with Group 1 mainly containing isolates from east and central China, and Group 2 containing isolates from south and southwest China.
The F st value was used to measure genetic differentiation among geographical populations [45]. Generally, an F st value within the range of 0-0.05 indicates little genetic differentiation; 0.05-0.15, moderate differentiation; 0.15-0.25, large differentiation; and > 0.25, very large genetic differentiation [46]. The pairwise F st between Pop1 and Pop2 was estimated at 0.066, indicating moderate genetic differentiation between the two sub-populations. When considering the 16 geographical populations, except for Fst values between ZJ and AH, ZJ and CQ, ZJ and GZ, SC and GZ, ZJ and HuN, and SC and HuN, respectively, most of the F st values were fell within the range of 0.05-0.15, also indicating moderate genetic differentiation. In addition, we observed higher pairwise F st values between HN and the other 15 geographical populations, possibly due to the isolation of HN island from the mainland, leading to difficulties in gene exchange. Consistent with the results of estimated F st , the AMOVA analysis estimated moderate genetic variation of S. mansoni in China. Generally, genetic variation among populations is highly affected by genetic drift, gene flow, mutations, selection and long-term evolution [47]. Of the 16 geographical populations identified in this study, five migration events were detected in the TreeMix analysis, namely migration began from GZ to SC, SC to JX, SC to HB, GZ to YN and GZ to JS, indicating that there was some small amount of genetic exchange among the geographical populations. Usually, the populations with frequent gene exchange have a lower degree of genetic differentiation, whereas populations with low frequent gene exchange will exhibit greater genetic differentiation [48]. Gene exchange between isolates of S. mansoni is not very frequent, which may also account for the moderate genetic differentiation between populations. Last but not least, a core set of 23 SSRs were screened further, and a core set of S. mansoni samples in each geographical population was successfully generated using the core set of 23 SSRs. Finally, a total of 76 isolates were collected thatcan represent 91.1% genetic diversity of all 362 samples. The establishment of core SSRs and core samples is essential for the optimal management and use of genetic resources of S. mansoni, and they provide a strong foundation for establishing core SSRs and core samples for the Spirometra species.

Conclusions
In this study, we successfully established a Target SSRseq method containing 34 perfect SSR loci for genotyping S. mansoni isolates. The Target SSR-seq method provides high-throughput SSR genotyping with high accuracy and efficiency. It enables accurate genotyping of S. mansoni populations and can be widely extended to genetic diversity analyses of other parasites. The results of our comprehensive genetic diversity study based on this newly established method suggest that S. mansoni samples in China have high genetic diversity and moderate genetic differentiation, and that little genetic exchange exists among geographical populations. In future studies, more populations of Spirometra isolates across worldwide geographical ranges are needed to reveal the exact genetic patterns of this medical tapeworm and discover the underlying mechanisms generating such genetic variation patterns.
Additional file 1: Figure S1. Estimates of pairwise F st between S. mansoni populations. Figure S2. Estimated genetic structure of S. mansoni in China as inferred by the STRU CTU RE software on the basis of the data on a core set of 23 SSRs. Figure S3. Principal coordinate analysis (PCoA) describing the relationships of 362 S. mansoni isolates on the basis of the data on a core set of 23 SSRs. Table S1. Information of origin, locality and date of collection for 368 S. mansoni isolates. Table S2. Primers of cytochrome c oxidase subunit 1 (cox1) gene. Table S3. SSR mining in S. mansoni. Table S4. Characteristics of a core set of 23 SSR markers. Table S5. Analysis of molecular variance (AMOVA) of the populations of S. mansoni based on a core set of 23 SSRs. Table S6. Core samples of S. mansoni identified in each geographical population of China.